Skip to content

Conversation

aelovikov-intel
Copy link
Contributor

@aelovikov-intel aelovikov-intel commented Sep 26, 2025

Compilation of #include <sycl/sycl.hpp> is slow and that's especially problematic for SYCL RTC (run-time compilation). One way to overcome this is fine-grained includes that are being pursued separately. Another way is to employ clang's precompiled headers support which this PR is doing. Those two approaches can be combined, and this PR adds test-e2e/PerformanceTests/KernelCompiler/auto-pch.cpp that gives some idea of the PCH impact. The test shows PCH benefits when compiling some of the fine-grained includes on top of absolute minimum required to compiled SYCL RTC's "Hello world". From one of the CI runs:

Extra Headers Without PCH With auto-PCH
176ms 137ms 136ms 136ms 136ms 226ms 64ms 64ms 64ms 64ms
sycl/half_type.hpp 165ms 165ms 165ms 165ms 165ms 267ms 71ms 72ms 72ms 72ms
sycl/ext/oneapi/bfloat16.hpp 174ms 173ms 173ms 173ms 173ms 279ms 76ms 73ms 73ms 74ms
sycl/marray.hpp 142ms 143ms 142ms 142ms 143ms 235ms 66ms 66ms 66ms 66ms
sycl/vector.hpp 296ms 290ms 290ms 290ms 290ms 487ms 124ms 125ms 125ms 125ms
sycl/multi_ptr.hpp 278ms 278ms 276ms 275ms 274ms 441ms 125ms 125ms 125ms 125ms
sycl/builtins.hpp 537ms 533ms 531ms 531ms 531ms 883ms 218ms 218ms 219ms 218ms

It misses sycl/sycl.hpp line because that currently crashes FE when reading the generated PCH, the crash is being investigated/fixed separately.

Implementation-wise I'm reusing existing upstream clang::PrecompiledPreamble with one minor modification. It seems that PrecompiledPreamble's main usage is for things like clangd so it ignores errors in the code. I've modified it so that those errors would break pch-generation the same way normal compilation would break. I'm also not sure if we'd want that long-term, because it seems that making such "auto-pch" persistent would deviate from the upstream version of PrecompiledPreamble even more. I can imagine that in some near future we'd need to "fork" it into a separate utility. Still, seems to be fine for the first step.

Driver modifications are for the --auto-pch option support that should only be present on the SYCL RTC path and not for the regular clang invocations from the command line. I'm relatively confident those will stay in future.

@aelovikov-intel aelovikov-intel changed the title [DRAFT][SYCL RTC] Implement --auto-pch support [SYCL RTC] Introduce --auto-pch support Sep 30, 2025
@premanandrao
Copy link
Contributor

Is it safe to merge this PR with the expectation that the FE will be fixed before the next release? Or, should we hold this PR until there is a fix?

I think this is a reasonable expectation to have. Please don't hold this PR for that fix.

@aelovikov-intel aelovikov-intel merged commit a44f116 into intel:sycl Oct 8, 2025
53 of 54 checks passed
@aelovikov-intel aelovikov-intel deleted the sycl-rtc-auto-pch branch October 8, 2025 17:57
aelovikov-intel added a commit to aelovikov-intel/llvm that referenced this pull request Oct 15, 2025
Built on top of `--auto-pch` (in-memory) introduced in
intel#20226.

The most significant technical decision was how to implement the
filesystem cache. I've looked into the following options:

* `sycl/source/detail/persistent_device_code_cache.hpp`
  Also, see `sycl/doc/design/KernelProgramCache.md` Seems to be tailored
  for the very specific usage scenarios, would be very resource
  consuming to split into a generic data structure that would then be
  used for two different use cases.

  This cache is disabled by default and I'm not sure how well-tested it
  is. Also, using plain ".lock" files for "advisory locking" instead of
  the native filesystem mechanisms (e.g., locking APIs in
  `fcntl`/`flock`/`CreateFile`/`LockFileEx`) made me question if it's
  worth generalizing and how much work would be necessary there.

* `llvm/include/llvm/Support/Caching.hpp`
  Originally implemented as part of ThinLTO implementation, moved into
  `LLVMSupport` later with the following commit message:

    > We would like to move ThinLTO’s battle-tested file caching
    > mechanism to the LLVM Support library so that we can use it
    > elsewhere in LLVM.

  API is rather unexpected, so my research hasn't stopped here.

* `lldb/include/lldb/Core/DataFileCache.h`
  Uses `LLVMSupport`'s caching from the previous bullet under the hood,
  but provides an easier to grasp API. If we were developing upstream I
  think uplifting that abstraction into `LLVMSupport` library and then
  using in both `lldb` and `libsycl` would probably be the choice I'd
  vote for. However, doing that downstream was too much efforts so I
  ultimately decided not to go with this approach.

  That cache also has a `std::mutex` on the "hot"
  `DataFileCache::GetCachedData` path, I presume to avoid creating the
  same entry from multiple threads.

In the end, I've chosen to use `LLVMSupport`'s quirky (or maybe I just
hasn't grown enough to appreciate it) caching API directly and that's
what is done in this PR. Unlike 'lldb''s cache I decided to trade
possible duplicate work of building the preamble on a cache miss from
concurrent threads in favor of no inter-thread synchronization on the
cache hit path (not profiled/measured though).
aelovikov-intel added a commit to aelovikov-intel/llvm that referenced this pull request Oct 15, 2025
Built on top of `--auto-pch` (in-memory) introduced in
intel#20226.

The most significant technical decision was how to implement the
filesystem cache. I've looked into the following options:

* `sycl/source/detail/persistent_device_code_cache.hpp`
  Also, see `sycl/doc/design/KernelProgramCache.md` Seems to be tailored
  for the very specific usage scenarios, would be very resource
  consuming to split into a generic data structure that would then be
  used for two different use cases.

  This cache is disabled by default and I'm not sure how well-tested it
  is. Also, using plain ".lock" files for "advisory locking" instead of
  the native filesystem mechanisms (e.g., locking APIs in
  `fcntl`/`flock`/`CreateFile`/`LockFileEx`) made me question if it's
  worth generalizing and how much work would be necessary there.

* `llvm/include/llvm/Support/Caching.hpp`
  Originally implemented as part of ThinLTO implementation, moved into
  `LLVMSupport` later with the following commit message:

    > We would like to move ThinLTO’s battle-tested file caching
    > mechanism to the LLVM Support library so that we can use it
    > elsewhere in LLVM.

  API is rather unexpected, so my research hasn't stopped here.

* `lldb/include/lldb/Core/DataFileCache.h`
  Uses `LLVMSupport`'s caching from the previous bullet under the hood,
  but provides an easier to grasp API. If we were developing upstream I
  think uplifting that abstraction into `LLVMSupport` library and then
  using in both `lldb` and `libsycl` would probably be the choice I'd
  vote for. However, doing that downstream was too much efforts so I
  ultimately decided not to go with this approach.

  That cache also has a `std::mutex` on the "hot"
  `DataFileCache::GetCachedData` path, I presume to avoid creating the
  same entry from multiple threads.

In the end, I've chosen to use `LLVMSupport`'s quirky (or maybe I just
hasn't grown enough to appreciate it) caching API directly and that's
what is done in this PR. Unlike `lldb`'s cache, I decided to trade
possible duplicate work of building the preamble on a cache miss from
concurrent threads in favor of no inter-thread synchronization (not
profiled/measured though) on the cache hit path and implementation simplicity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants